Optimal Sequential Wireless Relay Placement 
on a Random Lattice Path 



Abhishek Sinha*, Arpan Chattopadhyay* , K. P. Naveen*, Marceau Coupechoux^ and Anurag Kumar* 
*Dept. of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560012, India. 
Email: {abhishek.sinha.iisc, arpanc.ju}@gmail.com, {naveenkp, anurag} @ece.iisc.ernet.in 
^Telecom ParisTech and CNRS LTCI, Dept. of Informatique et Reseaux, 23, avenue d'ltalie, 75013 Paris, France. 

Email: marceau.coupechoux@telecom-paristech.fr 



Abstract — Our work is motivated by the need for impromptu 
(or "as-you-go") deployment of relay nodes (for establishing 
a packet communication path with a control centre) by fire- 
men/commandos while operating in an unknown environment. 
We consider a model, where a deployment operative steps along 
a random lattice path whose evolution is Markov. At each step, 
the path can randomly either continue in the same direction or 
take a turn "North" or "East," or come to an end, at which point 
a data source (e.g., a temperature sensor) has to be placed that 
will send packets to a control centre at the origin of the path. 
A decision has to be made at each step whether or not to place 
a wireless relay node. Assuming that the packet generation rate 
by the source is very low, and simple link-by-link scheduling, we 
consider the problem of relay placement so as to minimize the 
expectation of an end-to-end cost metric (a linear combination of 
the sum of convex hop costs and the number of relays placed). 
This impromptu relay placement problem is formulated as a 
total cost Markov decision process. First, we derive the optimal 
policy in terms of an optimal placement set and show that this 
set is characterized by a boundary beyond which it is optimal to 
place. Next, based on a simpler alternative one-step-look-ahead 
characterization of the optimal policy, we propose an algorithm 
which is proved to converge to the optimal placement set in a 
finite number of steps and which is faster than the traditional 
value iteration. We show by simulations that the distance based 
heuristic, usually assumed in the literature, is close to the optimal 
provided that the threshold distance is carefully chosen. 

Index Terms — Relay placement, Sensor networks, Markov 
decision processes, One-step-look-ahead. 

I. Introduction 

Wireless networks, such as cellular networks or multihop 
ad hoc networks, would normally be deployed via a planning 
and design process. There are situations, however, that require 
the impromptu (or "as-you-go") deployment of a multihop 
wireless packet network. For example, such an impromptu 
approach would be required to deploy a wireless sensor 
network for situational awareness in emergency situations such 
as those faced by firemen or commandos (see Q], J2)). For 
example, as they attack a fire in a building, firemen might 
wish to place temperature sensors on fire-doors to monitor 
the spread of fire, and ensure a route for their own retreat; 
or commandos attempting to flush out terrorists might wish 
to place acoustic or passive infra-red sensors to monitor the 
movement of people in the building. As-you-go deployment 
may also be of interest when deploying a multi-hop wireless 
sensor network over a large terrain (such as a dense forest) 
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Fig. 1. A wireless network being deployed as a person steps along a 
random lattice path. Inverted V: location of the deployment person; solid line: 
path already covered; circles: deployed relays; thick dashed path: a possible 
evolution of the remaining path. The sensor to be placed at the end is also 
shown as the black rectangle. 

in order to obtain a first-cut deployment which could then be 
augmented to a network with desired properties (connectivity 
and quality-of-service). 

With the above larger motivation in mind, in this paper we 
are concerned with the rigorous formulation and solution of 
a problem of impromptu deployment of a multihop wireless 
network along a random lattice path, see Fig. [T] The path 
could represent the corridor of a large building, or even a 
trail in a forest. The objective is to create a multihop wireless 
path for packet communication from the end of the path 
to its beginning. The problem is formulated as an optimal 
sequential decision problem. The formulation gives rise to a 
total cost Markov decision process, which we study in detail in 
order to derive structural properties of the optimal policy. We 
also provide an efficient algorithm for calculating the optimal 
policy. 

A. Related Work 

Our study is motivated by "first responder" networks, a con- 
cept that has been around at least since 2001. In [2 |, Howard et 
al. provide heuristic algorithms for the problem of incremental 
deployment of sensors (such as surveillance cameras) with the 
objective of covering the deployment area. Their problem is 
related to that of self-deployment of autonomous robot teams 
and to the art-gallery problem. Creation of a communication 
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network that is optimal in some sense is not an objective in 
0. In a somewhat similar vein, the work of Loukas et al. 
is concerned with the dynamic locationing of robots that, in 
an emergency situation, can serve as wireless relays between 
the infrastructure and human-carried wireless devices. The 
problem of impromptu deployment of static wireless networks 
has been considered in 0, 0, 0, 0. In 0, Naudts et al. 
provide a methodology in which, after a node is deployed, 
the next node to be deployed is turned on and begins to 
measure the signal strength to the last deployed node. When 
the signal strength drops below a predetermined level, the next 
node is deployed and so on. Souryal et al. provide a similar 
approach in 0, 0, where an extensive study of indoor RF 
link quality variation is provided, and a system is developed 
and demonstrated. The work reported in Q is yet another 
example of the same approach for relay deployment. More 
recently, Liu et al. describe a "breadcrumbs" system for 
aiding firefighters inside buildings, and is similar to our present 
paper in terms of the class of problems it addresses. In a 
survey article [1|, Fischer et al. describe various localization 
technologies for assisting emergency responders, thus further 
motivating the class of problems we consider. 

In our earlier work (Mondal et al. 0) we took the first steps 
towards rigorously formulating and addressing the problem 
of impromptu optimal deployment of a multihop wireless 
network on a line. The line is of unknown length but prior 
information is available about its probability distribution; at 
each step, the line can come to an end with probability p, at 
which point a sensor has to be placed. Once placed, the sensor 
sends periodic measurement packets to a control centre near 
the start of the line. It is assumed that the measurement rate 
at the sensor is low, so that (with a very high probability) 
a packet is delivered to the control centre before the next 
packet is generated at the sensor. This so called "lone packet 
model" is realistic for situations in which the sensor makes a 
measurement every few seconds. 

The objective of the sequential decision problem is to 
minimise a certain expected per packet cost (e.g., end-to-end 
delay or total energy expended by a node), which can be 
expressed as the sum of the costs over each hop, subject to 
a constraint on the number of relays used for the operation. 
It has been proved in [9] that an optimal placement policy 
solving the above mentioned problem is a threshold rule, i.e., 
there is a threshold r* such that, after placing a relay, if the 
operative has walked r* steps without the path ending, then a 
relay must be placed at r*. 

B. Outline and Our Contributions 

In this paper, while continuing to assume (a) that a single 
operative moves step-by-step along a path, deciding to place 
or to not place a relay, (b) that the length of the path is a 
geometrically distributed random multiple of the step size, (c) 
that a source of packets is placed at the end of the path, (d) 
that the lone packet traffic model applies, and (e) that the total 
cost of a deployment is a linear combination of the sum of 
convex hop costs and the number of nodes placed, we extend 



Source node 
^ "y Sink node 
O Relay node 

NLOS link 

Random path 



-P- 



o 

(0,0) 



Fig. 2. A depiction of relay deployment along a random lattice path with 
NLOS propagation. 

the work presented in to the two-dimensional case. At each 
step, the line can take a right angle turn either to the "East" or 
to the "North" with known fixed probabilities. We assume a 
Non-Line-Of-Sight (NLOS) propagation model, where a radio 
link exists between two nodes placed anywhere on the path, 
see Fig. [2] The lone packet model is a natural first assumption, 
and would be useful in low-duty cycle monitoring applications. 
Once the network has been deployed, an analytical technique 
such as that presented in iflOl can be used to estimate the 
actual packet carrying capacity of the network. 

We will formally describe our system model and problem 
formulation in Section [II] The following are our main contri- 
butions: 

• We formulate the problem as a total cost Markov decision 
process (MDP), and characterize the optimal policies in 
terms of placement sets. We show that these optimal 
policies are threshold policies and thus the placement sets 
are characterized by boundaries in the two-dimensional 
lattice (Section III I. Beyond these boundaries, it is opti- 



mal to place a relay. 

• Noticing that placement instants are renewal points in the 
random process, we recognize and prove the One-Step- 
Look-Ahead (OSLA) characterization of the placement 
sets (Section [TV]>. 

• Based on the OSLA characterization, we propose an itera- 
tive algorithm, which converges to the optimal placement 
set in a finite number of steps (Section [V]). We have 
observed that this algorithm converges much faster than 
value iteration. 

• In Section VII we provide several numerical results 
that illustrate the theoretical development. The relay 
placement approach proposed in 0, 0, 0, Q would 
suggest a distance threshold based placement rule. We 
numerically obtain the optimal rule in this class, and 
find that the cost of this policy is numerically indistin- 
guishable from that of the overall optimal policy provided 
by our theoretical development. It suggests that it might 
suffice to utilize a distance threshold policy. However, the 
distance threshold should be carefully designed taking 
into account the system parameters and the optimality 
objective. 

For the ease of presentation we have moved most of the proofs 
to the Appendix. 
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II. System Model 

We consider a deployment person, whose stride length is 
1 unit, moving along a random path in the two-dimensional 
lattice, placing relays at some of the lattice points of the path 
and finally a source node at the end of the path. Once placed, 
the source node periodically generates measurement packets 
which are forwarded by the successive relays in a multihop 
fashion to the control centre located at (0,0); see Fig. [2] 

A. Random Path 

Let Z + denote the set of nonnegative integers, and the 
nonnegative orthant of the two dimensional integer lattice. We 
will refer to the x direction as East and to the y direction as 
North. Starting from (0, 0) there is a lattice path that takes 
random turns to the North or to the East (this is to avoid the 
path folding back onto itself, see Fig|2]i. Under this restriction, 
the path evolves as a stochastic process over Z^_. When the 
deployment person has reached some lattice point, the path 
continues for one more step and terminates with probability 
p, or does not terminate with probability 1 — p. In either case, 
the next step is Eastward with probability q and Northward 
with probability 1 — q. Thus, for instance, (1 — p)q is the 
probability that the path proceeds Eastwards without ending. 
The person deploying the relays is assumed to keep a count of 
m and n, the number of steps taken in the x direction and in 
y direction, repectively, since the previous relay was placed. 
He is also assumed to know the probabilities p and q. 

B. Cost Definition 

In our model, we assume NLOS propagation, i.e., packet 
transmission can take place between any two successive relays 
even if they are not on the same straight line segment of the 
lattice path. In the building context, this would correspond to 
the walls being radio transparent. The model is also suitable 
when the deployment region is a thickly wooded forest where 
the deployment person is restricted to move only along some 
narrow path (lattice edges in our model). 

For two successive relays separated by a distance r, we 
assign a cost of d(r) which could be the average delay 
incurred over that hop (including transmission overheads and 
retransmission delays), or the power required to get a packet 
across the hop. For instance, in our numerical work we use 
the power cost, d(r) = P m + ^r 71 , where P m is the minimum 
power required, 7 represents an SNR constraint and 77 is the 
path-loss exponent. Now suppose N relays are placed such that 
the successive inter-relay distances are ro,r%,--- , rjv (?"o is 
the distance from the control centre at (0, 0) and the first relay, 
and rjy is the distance from the last relay to the sensor placed 
at the end of the path) then the total cost of this placement is 
the sum of the one-hop costs C = 2~2i=o ^( r i)- The total cost 
being the sum of one-hop costs can be justified for the lone 
packet model since when a packet is being forwarded there is 
no other packet transmission taking place. 

We now impose a few technical conditions on the one-hop 
cost function d(-): (CI) d(0) > 0, (C2) d(r) is convex and 



increasing in r, and (C3) for any r and S > the difference 
d(r + S) — d(r) increases to 00. 

(CI) is imposed considering the fact that it requires a non- 
zero amount of delay or power for transmitting a packet 
between two nodes, however close they may be. (C2) and (C3) 
are properties we require to establish our results on the optimal 
policies. They are satisfied by the power cost, P m +77-'' , and 
also by the mean hop delay (see [11|). 

We will overload the notation d(-) by denoting the one-hop 
cost between the locations (0, 0) and (x, y) £ 5f 2 as simply 
d(x, y) instead of <2(||(x, y) — (0, 0)| |). Using the condition on 
d(r) we prove the following convexity result of d(x,y). 

Lemma 1: The function d(x,y) is convex in (x,y), where 
(x,y) £ M 2 . 

Proof: This follows from the fact that d(-) is convex, 
non-decreasing in its argument. For a formal proof, see Ap- 
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We further impose the following condition on d{x, y) where 
(x, y) £ 9fJ 2 . We allow a general cost-function d(x, y) endowed 
with the following property: (C4) The function d(x, y) is 
positive, twice continuously partially differentiable in variables 
x and y and Vx, y £ R+, 



d xx {x,y)>0, d xy (x,y)>0, d yy (x,y) > 0, 



(1) 



where d xy (x, y) = . These properties also hold for the 

mean delay and the power functions mentioned earlier. 

Finally define, for (m, n) £ Zi, Ai(m, n) = d(m+l, n) — 
d(m, n) and A2(m, n) = d(m, n + 1) — d(m, n). 

Lemma 2: Ai (m,n) and A2 (m, n) are non-decreasing in 
both the coordinates m and n. 

Proof: This follows directly from ([TJ. See Appendix A-B 
for details. ■ 

C. Deployment Policies and Problem Formulation 

A deployment policy 7r is a sequence of mappings (/xj. : 
k > 0), where at the k-th step of the path (provided that the 
path has not ended thus far) fik allows the deployment person 
to decide whether to place or not to place a relay where, 
in general, randomization over these two actions is allowed. 
The decision is based on the entire information available to 
the deployment person at the fc-th step, namely the set of 
vertices traced by the path and the location of the previous 
vertices where relays were placed. Let II represent the set of 
all policies. For a given policy ir £ II, let E^ represent the 
expectation operator under policy ir. Let C denote the total 
cost incurred and N the total number of relays used. We are 
interested in solving the following problem, 



min E^C + XE n N, 
•n-en 



(2) 



where A > may be interpreted as the cost of a relay. 
Solving the problem in |2]) can also help us solve the following 
constrained problem, 



Subject to: E^N < p avg , 



mm 



(3) 
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where p avg > is a contraint on the average number of 
relays (we will describe this procedure in Section [VI) . First, 
in Sections [HI] to [V] we work towards obtaining an efficient 
solution to the problem in Q. 

III. MDP Formulation and Solution 

In this section we formulate the problem in Q as a total cost 
infinite horizon MDP and derive the optimal policy in terms of 
optimal placement set. We show that this set is characterized 
by a two-dimensional boundary, upon crossing which it is 
optimal to place a relay. 

A. States, Actions, State -Transitions and Cost Structure 

We formulate the problem as a sequential decision process 
starting at the origin of the lattice path. The decision to place or 
not place a relay at the fc-th step is based on ((Mk, Nk), Zk), 
where (Mk,Nk) denotes the coordinates of the deployment 
person with respect to the previous relay and Zk G {e, c}; 
Zk = e means that at step k the random lattice path has 
ended and Zk = c means that the path will continue in the 
same direction for at least one more step. Thus, the state space 
is given by: 



S 



^(m,n,z) : (m,n) € Z+,z G {e,c}| U {(/)}, 



(4) 



where <p denotes the cost-free terminal state, i.e., the state after 
the end of the path has been discovered. The action taken at 
step k is denoted Uk £ {0, 1}, where Uk = 1 is the action to 
place a relay, and Uk — is the action of not placing a relay. 
When the state is (to, n, c) and when action u is taken, the 
transition probabilities are given by: 

• If u is then, 

(to + 1, n, c) w.p. (1 
> (to + 1, n, e) w.p. pq 
■> (m, n + 1, c) w.p. (1 - 
» (to, n + 1, e) w.p. p(l 



(i) (m,n,c) - 

(ii) (m,n,c) - 

(iii) (m, n, c) 

(iv) (to, n, c) - 
If u is 1 then 

(i) (to, n, c) - 

(ii) (to, n, c) - 

(iii) (m, n, c) - 

(iv) (to, n, c) - 



p)q 

q). 



q) 



p)q 



(1,0, c) w.p. (l 
► (1,0, e) w.p. pq 
-> (0,1, c) w.p. (l-„)(l- ? ) 
■> (0,1, e) w.p. p(l-q). 
If = e then the only allowable action is u = 1 and we 
enter into the state <fi. If the current state is <fi, we stay in the 
same cost-free termination state irrespective of the control u. 
The one step cost when the state is s £ S is given by: 



d(m, n) 
c(s,u) = ■{ \ + d(m,n) 




if s = (m, n, e), 

if u = 1 and ,s = (m, n, 

if u = or s = 6. 



For simplicity we write the state (m,n,c) as simply (m,n). 

B. Optimal Placement Set V\ 

Let J\(m, n) denote the optimal cost-to-go when the current 
state is (to, n). When at some step the state is (to, n) the 
deployment person has to decide whether to place or not place 



a relay at the current step. J\ is the solution of the Bellman 
equation 1T21 Page 137, Prop. 1.1], 



Ja(to, n) = min{c p (TO, n), c np {m, n)}, 



(5) 



where c p (m,n) and c np (m,n) denote the expected cost in- 
curred when the decision is to place and not place a relay, 
respectively. c p (m,n) is given by 

c p (m,n) = A + d(m, n) + (1 — p)(l — q) J\(0, 1) 

+(l-p)qj x (l,0)+pd(l). (6) 

The term A + d(m, n) in the above expression is the one 
step cost which is first incurred when a relay is placed. The 
remaining terms are the average cost-to-go from the next step. 
The term (1— p)(l — q)J\(Q, 1) can be understood as follows: 
(1— .p)(l— q) is the probability that the path proceeds Eastward 
without ending. Thus the state at the next step is (0,1, c) 
w.p. (1 — p)(l — q), the optimal cost-to-go from which is, 
J A (0, 1). Similarly for the term (1 - p)q J A (1, 0), (1 - p)q 
is the probability that the path will proceed, without ending, 
towards the North (thus the next state is (1, 0, c)) and J\(l, 0) 
is the cost-to-go from the next state. Finally, in the term pd(l), 
p is the probability that the path will end, either proceeding 
East or North, at the next step and d(l) is the cost of the 
last link. Following a similar explanation, the expression for 
c np (m, n) can be written as: 

c np (m,n) 

; I - p)qj\(m + 1, n) + (1 — p)(l — q)J\(m, n 
+pqd(m + 1, n) +f>(l — q)d(m. n + 1). 



1) 
(7) 



We define the optimal placement set V\ as the set of all 
lattice points (to, n), where it is optimal to place rather than 
to not place a relay. Formally, 



V\ = |(m,n) : c p (m,n) < c„ p (m,n)|. 



(8) 



In this definition, if the costs of placing and not-placing are 
the same, we have arbitrarily chosen to place at that point. 

The above result yields the following main theorem of this 
section which characterizes the optimal placement set V\ in 
terms of a boundary. 

Theorem 1: The optimal placement set V\ is characterized 
by a boundary, i.e., there exist mappings to* : Z + — > Z + and 
n* : Z + ->• Z + such that: 

'Pa = IJ {(m,n) : to > m*{n)} (9) 

riGZ + 

= |J {(m,n) : n > n*(m)}. (10) 

Proof Outline: The proof utilizes the conditions C2 and 
C3 imposed on the cost function d(-). First, using <|6} and ^ 
in ([8]) and rearranging we alternatively write Pa as, Pa = 
{(to, n) : F(m,n) > K}, where K is a constant and F(-, •) 
is some function of to and n. Then, we complete the proof 
by showing that F(m, n) is non-decreasing in both m and n. 
This requires us to prove (using an induction argument) that 
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H\(m, n) := J\(m, n) — d(m, n) is non-decreasing in to and 
n. Also, Lemma [2] has to be used here. For a formal proof see 
Appendix [B] ■ 
Remark: Though the optimal placement set Vx was charac- 
terized nicely in terms of a boundary m*(-) and n*(-), a naive 
approach of computing this boundary, using value iteration to 
obtain J\(m,n) (for several values of (m,n) G Z?_), would 
be computationally intensive. Our effort in the next section 



Proof: Noticing that in (Hi A q (m,n) is non-decreasing 



(Section IV I is towards obtaining an alternate simplified rep- 



resentation for Vx using which we propose an algorithm in 
Section [VJ which is guaranteed to return Vx in a finite (in 
practice, small) number of steps. 

IV. Optimal Stopping Formulation 

We observe that the points where the path has not ended, 
and a relay is placed, are renewal points of the decision 
process. This motivates us to think of the decision process 
after a relay is placed as an optimal stopping problem with 
termination cost J\(0, 0) (which is the optimal cost-to-go from 
a relay placement point). Let Vx denote the placement set 
corresponding to the OSLA rule (to be defined next). In this 
section we prove our next main result that Vx = Vx- 

A. One-Step-Look-Ahead Stopping Set Vx 

Under the OSLA rule, a relay is placed at state (m,n,c) 
if and only if the "cost ci(m, n) of stopping (i.e., placing a 
relay) at the current step" is less than the "cost C2(m, n) of 
continuing (without placing relay at the current step) for one 
more step, and then stopping (i.e., placing a relay at the next 
step)". The expressions for the costs ci(m, n) and C2(m,n) 
can be written as: 



Ci(m,n) = A + d(m, n) + J\(0, 0) 



and 



c 2 (m,n) = 

pq(d(m + 1, n) + p(l - q)d(m, n + 1)) + (1 — p) 
(qd{m + 1, n) + (1 - q)d(m, n + 1) + A + J A (0, 0)) . 

Then we define the OSLA placement set Vx as: 

Vx = {(m,n) G Z + : ci(m,n) < C2(m,n)}. 

Substituting for ci(m,n) and C2(m,n) and simplifying we 
obtain: 

V x = [{m,n) G 7L\ : p{\ + J A (0,0)) < A ? (m,n)}, (11) 

where A q (m, n) — qAi(m, n) + (1 — g)A2(m, n). 

Theorem 2: The OSLA rule is a threshold policy, i.e., there 
exist mappings to : Z + — > Z + and n : Z + — > Z + , which 
define the one-step placement set V\ as follows, 

"Pa = (J {(m,n) : m > rh{n)} (12) 



{(m,n) : n > n(m)}. 



(13) 



in (to, n) and p(X + J\(0, 0)) is a constant, the proof follows 
along the lines of the proof of Theorem [T] ■ 

Now, we present the main theorem of this section. 

Theorem 3: 

V x = V x . 

Proof: See Appendix [C] ■ 
Remark: The characterization in ( pT| is much simpler than 
the one in (20i once the value of Ja(0, 0) is given. In the 
following subsection, we define a function g(-) and express 
Ja(0, 0) as the minimum value of this function. 

B. Computation of Jx(0,0) 

Let us start by defining a collection of placement sets 
indexed by h > 0: 



V{h) = {(to, n)€Z 2 + : p(X + h) < A q (m, n)}. 



(14) 



Referring to ( 11 1, note that V{ Ja(0, 0)) = Vx- Let g(h) denote 



the cost-to-go, starting from (0,0), if the placement set V(h) 
is employed. Then, since Ja(0,0) is the optimal cost-to-go 
and V x G {V{h)} h > , we have J A (0, 0) = min h > g(h). 

To compute g(h), we proceed by defining the boundary 
B{h) of V(h) as follows: 

B(h) = {(to, n) G V(h) : (to — 1, n) G V c (h) or 

(77i,n-l) eV c {h)}, (15) 

where V c (h) := J? + - V(h). 

Suppose the corridor ends at some (to, n) G V c (h) U B(h), 
then only a cost of d(m, n) is incurred. Otherwise (i.e., if the 
corridor reaches some (to, n) G B(h) and continues), using a 
renewal argument, a cost of d(m, n) + A + g(h) is incurred, 
where d(m,n) + A is the cost of placing a relay and g(h) is 
the future cost-to-go. We can thus write: 

g(h) = P((to, n), e)d(m, n) + 

{m,n)eV=(h)UB(h) 

F ((™, n),c)(g(h)+X+d(m, n)), (16) 

(m,n)£B(l) 

where P((m.,7i),e) is the probability of the corridor ending 
at (to, n) and P((TO,n),c) is the probability of the corridor 
reaching the boundary and continuing. Solving for g(h), we 
obtain: 



1(h) = 



1 



1 -E( m ,„ )e e ( / i ) P (( TO ' n )' c ) 



mGZ 



+ 



P((to, n), e)d(m, n) + 

V (m,n)eV=(h)UB(h) 

P((m,n),c)(\ + d(m,n)) J. (17) 

(m,n)£B(h) J 

The above expression is extensively used in our algorithm 
proposed in the next section. 
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East Co-ordinates (m) 

Fig. 3. Example of placement set of the form in j I4| : 'o' denotes lattice points 
outside the placement set; lattice points on the boundary can be partitioned 
into three sets according to the direction, from which they can be reached. 



We conclude this subsection by deriving the expression for 
the probabilities P((m, n), e) and P((to, n), c). Let us partition 
the boundary B(h) into three mutually disjoint sets: 

B w (h) = {(m,n) G B(h) : (m - l,n) G B(h)} 
B s (h) = {(m,n) G B(h) : (m,n- 1) G B(h)} 
B null {h) = {(m,n) G B(h) : (m - l,n) ^ and 
(m,n-l) ^ 

For a depiction of the various boundary points, see Fig. [3] 
Now, P((m, n),e) can be written as: 

P((m,n),e) = 

( m +")p(l-p) m+ "- 1 ? m (l-9) n 

if (m,n) G V c (h)U B nvll (h) 
( m+ ^" 1 )p(l-p) m +"- 1 g m (l-( ? )™ if (m,n)eB w (h) 
( i!!7 1 )p(l-p) m+,l_1 9 m (l-«) n if (m,n)e8 s (ft). 

This can be understood as follows. Any point (m, n) G 
V c (h) U B null {h) can be reached from West or South. ( ro + n ) 
is the number of possible paths for reaching (m, n). Each 
such path has to go m times Eastwards (thus the term q m ) 
and n times Northwards (thus the term (1 — q) n ) and finally 
ending at (m, n) (thus the term p(l — p) m+n_1 ). Any point 
(m,n) E B w (h) can be reached only from South point 
(to, n — 1). The probability of reaching (to, n — 1) without 
ending is ( m+ " _1 )(l - - q)™- 1 . Then, the 

corridor reaches (to, n) and ends with probability p(l — q). 
P((to, n), e) for (to, n) G B s (h) can be obtained analogously. 
Similarly, P((to, n),c) can be written as: 

P((m,n),c) = 

( m +")(l-p) m+n 9 m (l-9) n 

if (m,n) G V c (h)UB nvll (h) 

( m+ m _1 ) C 1 ~ p) ro+ "« m (l - <?)" if (m, n) G ^(/i) 
ijJnC 1 -P) m+n q m (l - q) n if (to, fi) G 6 s (/i). 

V. OSLA Based Fixed Point Iteration Algorithm 

In this section, we present an efficient fixed point iteration 
algorithm (Algorithm [T]i using the OSLA rule in (Hi for ob- 



taining the optimal placement set, V\, and the optimal cost-to- 
go, Ja(0, 0). There are two advantages of our algorithm over 
the naive approach of directly trying to minimize the function 
g(-) to obtain Ja(0,0) (recall that Ja(0,0) = mm h > g(h)): 

• On the theoretical side, this iterative algorithm avoids ex- 
plicit optimization altogether, which, otherwise would be 
performed numerically over a continuous range. Without 
any structure on the objective function, direct numerical 
minimization of g(-) is difficult and often unsatisfactory, 
as it invariably uses some sort of heuristic search over 
this continuous range. 

• On the practical side, this algorithm is proved to converge 
within a finite number of iterations and observed to be 
extremely fast (requires 3 to 4 iterations typically). 

The following is our Algorithm which we refer to as the 
OSLA Based Fixed Point Iteration Algorithm. 

Algorithm 1 OSLA Based Fixed Point Iteration Algorithm 
Require: < p < 1, < q < 1, A>0 
l 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 



k = 0, /i (fc) = 
while 1 do 

P(hW) <- {(to, n) G Zi: p(A + h^) < A g (m, n)} 
Compute using 
if g(hW)== then 

Break; 
end if 

M fe+1 > g(h^) 
k <- k + 1 
end while 

return g(h^), V(h^) 



We now prove the correctness and finite termination prop- 
erties of our algorithm. First, we define g* := J\(0,0) = 
min/,>o g(h). Now consider a sample plot of the function g(h) 
in Fig. |4] From Fig. 4(a)| observe that whenever h > g* (which 
is around 150), h > g(h). Also, Fig. 4(b) (where we have 
plotted the functions g(h) and 1(h) = h) suggests that g(h) 
has a unique fixed point. We formally prove these results. 

Lemma 3: If h > g* then h > g(h). 



Proof: This follows from the manipulation of (17i. See 
Appendix [D| for details. ■ 

Lemma 4: g(h) has a unique fixed point. 

Proof: From ( fl4| and ( fTT) , we observe that 
7 5 (Ja(0,0)) = V\. From Theorem [5] V\ is the optimal 
placement set and thus the cost-to-go of using "P(Ja(0, 0)) is 
J A (0,0), i.e., 5 (J A (0,0)) = J A (0,0). Hence, J A (0,0) = g* 
is a fixed point of g(-). Now, any h > g* cannot be a fixed 
point since, in this case, h > g(h) from Lemma [3] On the 
other hand, any h < g* is such that h < g* < g(h) because 
g* is the optimal cost-to-go. Hence, g* is the unique fixed 
point of g(-). ■ 

We are now ready to prove the convergence property of our 
Algorithm. 

Lemma 5: 1) The sequence {/i' fc - ) }fe>i (in Algorithm [TJ) 
is non-increasing, i.e., hS k+1 ^> < h^, with the equality 
sign holding if and only if — g*. 
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Fig. 4. |(a)| Cost-to-go g(h) as a function of fc |(b)| Zoom on the cost-to-go 
g(h) as a function of h. These plots are for p = 0.02, q = 0.5, and A = 41. 



2) The sequence {V c (h^)}k>i is non-increasing, i.e., 
-pcQ^k+i)^ (_ -pc^(fc)^ where the containment is strict 
whenever V c (h^ k+1 '>) C, V x c . 

Proof: 1) Note first that > g* for k > 1 because 
/j( fc ) = g{h^ k ~ 1 ^) > g*. Then, for fc > 1, we have either 
/i« = 5 * or > g*. In the first case M fe+1 ) = g(h^) = 
3(51*) = 3* = /i^ 1 ) and we can stop, whereas in the second 
case, from Lemma [3] we have ft/ fc+1 ) = g(h^) < h^ k \ 

2) From ([14}, ft, 2 > ^1 implies T' c {h 1 ) C V c (h 2 ). Hence, 
as {ft. ( - fc - ) }fe>i is non-increasing (from Part 1)), {'P c (/i^ fe ' ) )}fc>i 
is also non-increasing. 

Suppose V c (h^ k+i y) = V c (h^) then = 
g(hS k ') — h( k+1 ^ (second equality is by the definition of 
{M fc )}), which implies h^ k+1 ' = g* (since g( ) has a unique 
fixed point, see Lemma[4}. Thus, T> c {h {k+ ^) = V\. ■ 

Theorem 4: Algorithm [T] returns g* and V\ in a finite 
number of steps. 

Proof: Noting that = .g(/i (0) ) > and using (fl4i 



we have V\ c C "P c (/i (1) ). Either TV = "P c (/i (1) ), in which 
case the algorithm stops. Otherwise, note that both sets, "Pa c 
and V c {h^>) contain a finite number of lattice points (from 
the definition of V{h) in {u}). Using Lemma [5] -p c (ft, (fe) ) 
converges to V\ in at most \V c (h^) — V\\ < 00 iterations. 
Once V c (h^) converges to V\, the algorithm stops and 
returns the optimal cost-to-go g*. ■ 



VI. Solving the Constrained Problem 

In this section, we devise a method to solve the constrained 
problem in ([3} using the solution of the unconstrained problem 
([2} provided by Algorithm [T] This method is applied in 
Section |VII-B| where, imposing a constraint on the average 
number of relays, we compare the performance of a distance 
based heuristic with the optimal. 

We begin with the following standard result which relates 
the solutions of the problems in ([2} and ([3}. 

Lemma 6: Let ir* x £ II be an optimal policy for the 
unconstrained problem in (|2j) such that E^'N = p aV g- Then 
7r^ is also optimal for the constrained problem in ([3}. 
However, the above lemma is useful only when we are able 
to exhibit a A such that E^jTV = p avg . The subsequent 
development in this section is towards obtaining the solution 
to the more general case. 

The expected number of relays used by the optimal policy, 
7T A , which uses the optimal placement set V\, can be computed 
as: 



E( m ,„ )e e A p (( TO ' n )' c ) 



P((m,n),c)' 



(18) 



where P((m, n),c) is the reaching probability corresponding 
to V\ and B\ is the boundary of Vx- A plot of E w * N vs. A 
is given in Fig. [5] We make the following observations about 

1) Ett* N decreases with A; this is as expected, since as each 
relay becomes "costlier" fewer relays are used on the average. 

2) Even when A = 0, E^* N is finite. This is because d(0) > 
0, i.e., there is a positive cost for a length link. Define the 
value of E^j N with A = to be p max . 

3) Ejr* N vs. A is a piecewise constant function. This occurs 
because the relay placement positions are discrete. For a range 
of values of A the same threshold is optimal. This structure 
is also evident from the results based on the optimal stopping 
formulation and the OSLA rule in Section [IV] It follows that 
for a value of A at which there is a step in the plot, there are 
two optimal deterministic policies, n and W, for the relaxed 
problem. Let p = E 7T N and p = E W N. 

We have the following structure of the optimal policy for 
the constrained problem: 

Theorem 5: 1) For p avg > p max the optimal placement 

set is obtained for A = 0, i.e., is Vq. 
2) For p avg < Pmax, if there is a A such that (a) E w * N = 
Pavg then the optimal policy is or (b) p < p avg < p 
then the optimal policy is obtained by mixing tt and w. 
Proof: 1) is straight forward. For proof of 2)-(a), see 
Lemma [6] Considering now 2)-(b), define < a < 1 such 
that (1 — a)p + ap = p aV g- We obtain a mixing policy 7r m by 
choosing tt w.p. I — a and 7f w.p. a at the beginning of the 
deployment. For any policy it we have the following standard 
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Fig. 5. Average number of relays E^*JV (left) and average power cost Fig. 7. Average total cost J\(0,0) as a function of q (p 
E n * C (right) as a function of A (p = 0.002, q = 0.5 and r\ = 2). J7 = 2). 
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Fig. 6. Average total cost Ja(0, 0) as a function of A (p = 0.002, q = 0.5 
and r\ = 2). 
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Fig. 8. Boundaries for various values of the path-loss exponent f? (p = 0.002, 
? = 0.5). 



argument: 

E Wm C + XE 7Tm N 

= (l-a)(E 2L C + Ap)+a(E # C + A/?) 

< (l-a)(E 7r C + AE T 7V)+a(E 7r C + AE 7r iV) 

= E^C* + AE^iV. (19) 

The inequality is because 7r and tt are both optimal for the 
problem (|2|i with relay price A. Thus, we have shown that 7r m 
is also optimal for the relaxed problem. Using this along with 
Ejr m N — Pavg in Lemma [6] we conclude the proof. ■ 

VII. Numerical Work 

For our numerical work we use the one-hop power function 
d(r) = P m + jri, with P rn = 0.1, 7 = 0.01. We first study 
the effect of parameter variation on the various costs. Next, 
we compare the performance of a distance based heuristic with 
the optimal. 

A. Effect of Parameter Variation 

In Fig. [3] we have already shown an optimal placement 
boundary for p — 0.002, q — 0.5, and 77 = 3. Since q = 0.5 
the boundary is symmetric about the m = n line. 

In Fig. |5J we plot E V »N and E n *C vs. A. The plot of 
Ja(0, 0) vs. A is in Fig. |6] These plots are for p — 0.002 



and q = 0.5. Since A is the cost per relay, as expected, 
Ett' N decreases as A increases. We observe that E n *C and 
the optimal total cost Ja(0,0) increase as A increases. A 
close examination of Fig. [5] reveals that both the plots are 
step functions. This is due to the discrete placement at lattice 
points, which results in the same placement boundary being 
optimal for a range of A values. Thus, as seen in Section VI at 
the A values, where there is jump in E^* N, a random mixture 
of two policies is needed. 

Fig. [7] shows the variation of the total optimal cost Ja(0, 0) 
with q. The variation is symmetric about q = 0.5. For a given 
probability p of the path ending, q = 0.5 results in the path 
folding frequently. In such a case, since NLOS propagation 
is permitted, and the path-loss is isotropic, fewer relays are 
required to be placed. On the other hand, when q is close to 
or to 1 the path takes fewer turns and more relays are needed, 
leading to larger values of the total cost. 

In Fig. [8] we show the variation of optimal boundaries 
with rj. As ij, the path-loss exponent, increases the hop cost 
increases for a given hop distance. This results in relays 
needing to be placed more frequently. As can be seen the 
placement boundaries shrink with increasing 77. We also notice 
that the placement boundary for 77 = 2 is a straight line; indeed 
this provable result holds for rj — 2 for any values of p and q. 
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Fig. 9. Boundary of the optimal placement set (OSLA boundary) and 
boundary derived from the heuristic policy (p = 0.002, q = 0.5 and r) = 2). 
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Fig. 10. Average total power as a function of p for the optimal policy 
(q = 0.5 and q = 1, which corresponds to the straight line) and for the 
heuristic (q = 0.5) for p = 0.002 and 77 = 2. 

B. Comparison with the Distance based Heuristic 

We recall from the literature survey in Section [IJ that prior 
work invariably proposed the policy of placing a relay after 
the RF signal strength from the previous relay dropped below 
a threshold. For isotropic propagation (as we have assumed 
in this paper), this is equivalent to placing the relay after a 
circular boundary is crossed. With this in mind, we obtained 
the optimal constant distance placement policy (called the 
heuristic hereafter) numerically in a manner similar to what 
is described in Section IV-B A sample result is provided 



in Fig. [9] for the parameters p = 0.002, q = 0.5 and 
7] = 2. We observe that if the path were to evolve roughly 
Eastward or Northward then the heuristic will result in many 
more relays being placed. On the other hand, if the path 
evolves diagonally (which has higher probability) then the 
two placement boundaries will result in similar placement 
decisions. 

This observation shows up in Fig. [10] where we show the 
cost incurred by the optimal policy (for q = 0.5 and for 
q = 1, which corresponds to a straight line corridor) and 
the heuristic (q = 0.5) vs. p for the constrained problem. 
As expected, the cost is much larger for q = 1 since the path 
does not fold. We find that for q = 0.5 the optimal placement 
boundary and the heuristic provide costs that are almost 
indistinguishable at this scale. We have performed simulations 
by varying the system parameters and observed the same good 
performance of the optimal constant distance placement policy. 



This suggests that the heuristic policy performs well provided 
that the threshold distance is optimally chosen with respect to 
the system parameters. 

VIII. Conclusion 

We considered the problem of placing relays on a random 
lattice path to optimize a linear combination of average power 
cost and average number of relays deployed. The optimal 
placement policy was proved to be of threshold nature (The- 
orem [TJ. We further proved the optimality of the OSLA 
rule (in Theorem [3J. We have also devised an OLSA based 
fixed point iteration algorithm (Algorithm [TJ, which we have 
proved to converge to the optimal placement set in a finite 
number of steps. Through numerical work we observed that the 
performance (in terms of average power incurred for a given 
relay constraint) of the optimal policy is closed to that of the 
distance threshold policy provided that the threshold distance 
is optimally chosen with respect to the system parameters. 
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Appendix A 
Proof of Lemmas in SectionUTI 

A. Proof of Lemma [7] 

Proof: Any norm is convex so that the function g(x, y) = 
\J x 1 + y 2 is convex in (x, y). The delay function d(-) is also 
assumed to be convex and non-decreasing in its argument. 
Hence by using the composition rule 1 1 3 , Section 3.2.4], we 
conclude that the function d(x,y) = d(y/ x 2 + y 2 ) is convex 
in (x,y) £ K 2 . ■ 

B. Proof of Lemma [2] 

Proof: It is easier to prove the lemma allowing the 
arguments m and n take values from the Real line. We have, 

Ai (x, y) = d(x + S,y)- d(x, y) 

Partially differentiating both sides w.r.t. x, we get 

dAi(x,y) 



dx 



= d x (x + 5,y) - d x {x,y) 
— 5d xx {Q, y) where x < £ < x + S 
> 0, 



where the equality follows from the application of Lagrange's 
Mean Value Theorem to the function d x (.,y) and the inequal- 
ity is due to assumption in ([TJ. The above proves the fact that 
Ai(x,y) is non-decreasing in x. 

To prove that A±(x,y) is non-decreasing in y, we partially 
differentiate A\(x, y) w.r.t. y and obtain 

dy 

— dd xy (i], y) where x < rj < x + S 
> 0, 



d y (x + 5, y) - d y {x,y) 



where the equality follows from the application of Lagrange's 
Mean Value Theorem to the function d y (.,y) and the inequal- 
ity is due to assumption in ([TJ. This shows that the function 
Ai(x,y) is non-decreasing in both the coordinates x and y. 
In a similar way it can also be shown that A 2 (x, y) is non- 
decreasing in x and y under the assumption made in ([TJ. This 
completes the proof. ■ 

Appendix B 
Proof of TheoremUJ 

We begin by defining H\(m 1 n) := J\(m, n) — d(m,n). 
Substituting for c p {m,n) and c np (m,n) (from (|6j and (|7j, 
respectively) into ([8j and rearranging we obtain (recall the 
definitions of Ai (m,n) and A2 (rn,n) from Section [TTJ: 

V x = 

I (to, n) : (1— p)(qH\(m+l, n) + (l — q)Hx(m, n+1)) 
+p(qAx(m,n) + (l-q)A 2 (m,n)) >X+ (20) 
(1 - P )qJ x (l, 0) + (1 - p)(l - g) J A (0, 1) +pd(l)}. 

Lemma 7: For a fixed A, H\(m,n) is non-decreasing in 
both m e Z + and n E Z + . 



Proof: Consider a sequential relay placement problem 
where we have K steps to go. The corridor length is the 
minimum of K and of a geometric random variable with 
parameter p. The problem be formulated as a finite horizon 
MDP with horizon length K. For any given (to, n), JK(m, n), 
K > 2 is obtained recursively: 

Jif(m, n) = min{c p (TO, n), c np (m, n)} 

= min{A + d(m,n) + (1 - p)qJ K -i{l, 0) + 
(1 - p)(i - q)J K -i(0, 1) +p(1 - g)d(l), 
(1 - p)qJ K -i{m + 1, n) +pqd(m + 1, n) + 
(l-p)(l-g)Jjr_i(m,n+l)+p(l-g)d(m,n+l)}. 

For A' = 1, since a sensor must be placed at the next step, 
we have Ji(to, ra) = min{A + d(TO, n) + gd(m+l, n) + 
(1 — q)d(m, n + 1)}. Therefore, 

Hi (to, n) := J\ (to, n) — d(m, n) 

= min{A + d(l), gAi(m, n) + (1 — q)A 2 (m, n)}. 

From Lemma [2j it follows that H\{m,ri) is non-decreasing 
in both to and n. Now we make the induction hypothesis and 
assume that iJ#-_i(m,n) is non-decreasing in to and n. We 
have: 

Hx{m, n) = Jn(m, n) — d(m, n) 

= min{A + (1 - p)qJ K -i(l, 0) +pqd(l) + 

(1 - p)(l - q)J K -x% 1) + p(l - ?)d(l), (1 - p) 
(qH K -i(m + 1, n) + (1 - q)H K -i{m, n+l)) + 
gAi(m, n) + (1 - g)A 2 (m, n)}. 

By the induction hypothesis and Lemma [2] it follows that 
Hx(m,n) is non-decreasing in both to and n. The proof is 
complete by taking the limit as K — > 00. ■ 
We are now ready to prove Theorem [TJ 

Proof of Theorem^ Referring to ( |20j , utilizing Lemma[7] 
and the Lemma [2] it follows that for a fixed n 6 Z+, the LHS 
(Left Hand Side) of pOj , describing the placement set Pa is 
an increasing function of m, while the RHS (Right Hand Side) 
is a finite constant. Also, because of the assumed properties 
of the function d(.), A± (m,n) — ► 00 as to —> 00, for any 
fixed n. Hence it follows that there exists an m*{n) E Z + 
such that (m, n) E V\ Vm > m*(n). Hence we may write 



^ = U„ eZi {(? 



> to*(?i)}. The second characteriza- 



tion follows by similar arguments. ■ 

Appendix C 
Proof of Theorem[3] 

We require the following lemmas to prove Theorem [3] 

Lemma 8: V\ C V\ 

Proof: Suppose that (to, n) E V\. Then from (|9j (to + 
l,n) E V\ and from (TToJ, (m,n+ 1) G T^a- Since (to, n) e 
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V\, we have from Q, {7]) and ^ that 

\+d(m,n) + {l-p)qJ x (l,0)+pqd(l) + (l-p)(l-q)x 
J A (0,l)+p(l-g)d(l)<(l-p)gJ A (m+l,n) + pgx 
d(m+l, n)+(l— p)(l— q) J\(m, n+l)+p(l— g)d(TO, ra+1). 

(21) 

Also we may argue that at the state (0, 0), it is optimal not to 
place. Indeed, if it had been optimal to place at the state (0, 0), 
at the next step, we return to the same state, viz., (0, 0). Now, 
because of the stationarity of the optimal policy, we would 
keep placing relays at the same point, and since "relay-cost" 
A > and d(0, 0) > 0, the expected cost for this policy would 
be oo. Hence, 



J A (0,0) = (l-p)qJ x (l,Q)+pqd(l)+ 

(l-p)(l-g)J A (0,l)+p(l -<?)<!). 



(22) 



Since (m+1, n) G V\ and (m, n+1) G V\, we have (noticing 
that it is optimal to place at these points and utilizing |6]i and 
d22l 



(23) 
(24) 



J A (m + l,n) = X + d(m+ l,n) + J A (0,0) 
J A (m,n + l) = \ + d(m,n + l) + J A (0,0). 

Now, using |22]i, |23]l and |24|i in (|2TJ, we obtain: 

p(A + J A (0,0)) < qA 1 (m,n) + (l-q)A 2 (m,n). 



(25) 



This proves that (to, n) € 'P A and hence V\ C 7 , A ■ 
Using the above Lemma and from Q, (lOi, (12i, (13i we 
can conclude that: 



n*(m) > n(m) 
m*(n) > fn(n) 



Vm G Z 
Vn G Zi 



Lemma 9: If (to, n.) G "P A is such that (to, n 
and (to + 1, n) G "P A , then (to, n) G P A 



(26) 
(27) 

G 7> A 



Proof: Since (to, n) G 7 3 A , we have from (111, 

p(A + J A (0, 0)) < gAi(m, n) + (1 - g)A 2 (m, n). (28) 



Now (to, n + 1) G "P A , and (to 
from d23l and dM: 



1, n) G Vx, hence we have 



J\{m + 1, n) 
J A (m, n + 1) 



A + d(m + l,n) + J A (0,0) 
A + d(m,n + l) + J A (0,0). 



The expression ( 22 1 is always true. Now using ((22b and the 



above two equations in inequality ( 28 1, we obtain d2Tb, which 
proves that (rn,n) G Vx- ■ 
Lemma 10: If (to, n) G "P A (resp. Vx), then (to + fc, n) G 
"P A (resp. 'P A ) and (to, n+/c) G P A (resp. Vx) for any fc G Z + . 



Proof: The proof follows easily because the LHS of ( 20 1 
is increasing in both to and n while the RHS is a constant. 



Similarly, the RHS of ( 1 1 1 is increasing in both to and n while 
the LHS is a constant. ■ 
We can now prove the main theorem. 

Proof of Theorem |ij We need to show that inequalities 
in (26 1 and (27i are equalities. For any to G Z + , suppose 



that in (26i n*(m) > n*(m) — 1 > n(m). Then we have the 



following inclusions: 



(m, n*(m)) 
(to, n*(m) — 1) 
(to, n*(m) — 1) 



G 
G 



Vx 
Vx 
Vx- 



(29) 



Let us index the collection of lattice-points (m + i, ri*(m) — 1 
by Ni, i G Z+, Since (m, n*(m ) — 1) € 7 3 A , from Lemma 
it follows that Ni G P A . From H, N ^Vx- 
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Then, the optimal policy being a threshold policy, we know 
that there exists a finite k > 0, s.t. € "P A , i.e., 



(TO + fc,71*(TO) - 1) G V X . 



(30) 



Again from Lemma 10 since (to, n*(m)) G "P A , we have for 
any k > 0: 



(TO + fc-l,?i*(m)) G TV 



(31) 



Now we see that for the point A&_i, the conditions of 
Lemma [9] are satisfied. Hence Nf.-i G 7 , A . If fc = 1, we 
already have a contradiction since Nq £ V\. Otherwise for 
k > 1, using Lemma 10 and Nk-i G 7 , A , we can show that 
A/j_2 is subject to the conditions of Lemma [9] implying that 
Nk-2 G 'P A . By iteration, we finally obtain that Ao G 7- > A , 



which contradicts ( 29 1 and proves the result. 



Appendix D 
Proof of Lemma[3] 
We start by showing the following lemma. 



Lemma 11: For any placement set V(h) of the form in ( 14 1, 
we have: 



^2 r(m,n)(A q (m,n)-p(\ + g(h)) 

(m,n)eV c (h) ^ 

+d(0,0) + A = 0, (32) 

where r(m,n) = (1 - p) m+n ( m + n )<? m (l - q) n - 

Proof: We first introduce some notations and definitions. 
Let us define a path a as a possible realization of the 
corridor, starting from (0, 0) and let P(er) be the probability 
of such a path. The set of all paths is denoted by E. Let E m „ 
denote the set of all paths that end at (to, n) G V c (h) U B(h) 
and Yj mn {c) the set of all paths that hit (m, n) G B(h) and 
continue. 

Let us denote the set of edges whose both end vertices 
belong to the set V c (h) U B(h) by E. A path a is completely 
characterized by its edge set E a . 

The reaching probability, r(m,n), of a point (m,n) is 
defined as the probability that a random path a reaches the 
point (to, n) and continues for at least one step. Hence, 
r(m,n) = (1 - p) m+n { m + n )q m (l - q) n . 

The incremental cost function 5 : E — > M+ is defined as 
follows: 



6(e) 



d(m + l,n) — d(m, n) = Ai(m, n) 

if e = {(to, n), (to + 1, n)} 

d(m, n + 1) — d(m, n) = A2(to, n) 

if e = {(to, n), (to, n + 1)}. 



(33) 
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For (m, n) £ a, the incremental cost function allows us to 
write: 



d(m,n)= S(e) + d(0,0). 



(34) 



e£E a C\E 



Now consider 



P((m, rt), e)d(rn, n) + P((to, n), c)d(m, n) 
•p c (?i)uB(h) B(/i) 

= £ £ P ( ff )( £ *(e) + d(0,0) 



£ £ P ( CT ) £ *(<0 + <f(O,O) 

B(/i) <rGE m „(c) ^ee£„nB 

= £<5(e) 2 P(<r)+d(0,0) 
= £<5(e)i(e)+d(0,0), 



(35) 



where by t(e) we denote the probability that a random path 
goes through the edge e G E. 

Now if e is horizontal, i.e., e = {(m, n),(m + 
1, n)}, (m, n) € V c (h), we have i(e) = qr(m, n) and (5(e) = 
A 1 (m,n). Similarly if e is vertical, i.e., e = {(to, n), (m,n + 
1)}, (m, n) € V°{h), we have i(e) = (1 — q)r(m,n) and 
5(e) = A2 (m,n). Using these relations, we may rewrite (35 1 
as follows: 

^2 r(m,ri)(qAi(m,n) + (1 - q)A 2 (m,n) J + d(0,0) 
= £ r(m,n)A q (m,ri) + d(0,0). (36) 

7"= (ft) 

Now consider the probability X)( m n)eB(/i) P (( m ' n )> c )- ^ 
is the probability that a random path continues beyond the 
boundary B(h). Hence we may write 

^P((m,n),c) = 1- £ P{{m,n),e) 

B(h) V<=(h)UB(h) 

= 1 - £ r(m,n)p. (37) 

V(h) 

Using ( [36| and ( [37] > in ( fT7) i and simplifying, we obtain the 
result. ■ 
Proof of Lemma rjj 
We recall the definition of V c {h). 

V c (h) = {(rn, n)el? + : p(X + h)> A q (m, n)}. (38) 

Since h > g*, we immediately conclude that V\ C V c (h). 



From (32 1 in Lemma 11 we may write for the optimal 
placement set V\\ 

r(m, n) A q (m, n) = p(X + g*) r(m, n) 

-(d(0,0) + A). (39) 



We may similarly write for the placement set V{h): 

r(m,n)A q (m,n) — p(X + g(h)) r(m,n) 

V{h) V(h) 

-(d(0,0) + A). (40) 



Now, since V\ c C V c (h), we may expand the LHS of (40 1 as 
follows: 

^2 f(m, ri)A q (m, n) 
v-(h) 

= r(m, n)A q (m, n) + r(m, n)A q (m, n) 

< ^""^ r(m, rt)A g (TO, n) + p(X + h) r(m, n) 

VI VHh)\Vl 

= p(A + 5 *)£V(m,n)-(d(0,0)+A) 



p(X + h) r(m,n), 
V(h)\vi 



(41) 



where, for the inequality, we used ( 38 1 and for (WTb, we have 
substituted the value for the quantity from ( |39| >. We may 
alternatively write the RHS of ( |40] > as: 

p(X + g{h)) r(m,n)-(d(0,0)+A) 

= P(A + fl>(ft)) f £ r(m, n) + £ r(m,n) 

-(d(0,0) + A). (42) 
Now comparing ( |4"T| > and ( |42) > and rearranging, we may write: 

P(g(h)-g*)^2r(m,n) <p(h-g(h)) £ r(m,n) (43) 

7V V-(h)\V x " 

Now Ep<=(h)\p x « r ( TO > n) = if and only if V c (h)\V\ c = 0, 
i.e., V(h) = P\. In this case we get g(h) = g* < h. On the 
other hand, if ^pc(h)\V\ a r(m, n) > 0, since g* < g(h), from 
the inequality |43|), we conclude that ft > ■ 



